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Abstract 



The binding of small metal ions to complex macromolecular structures is typically dominated by 
strong local interactions of the ion with its nearest ligands. For this reason, it is often possible 
to understand the microscopic origin of ion binding selectivity by considering simplified reduced 
models comprised of only the nearest ion-coordinating ligands. Although the main ingredients 
underlying simplified reduced models are intuitively clear, a formal statistical mechanical treatment 
is nonetheless necessary in order to draw meaningful conclusions about complex macromolecular 
systems. By construction, reduced models only treat the ion and the nearest coordinating ligands 
explicitly. The influence of the missing atoms from the protein or the solvent is incorporated 
indirectly. Quasi-chemical theory offers one example of how to carry out such a separation in the 
case of ion solvation in bulk liquids, and in several ways, a statistical mechanical formulation of 
reduced binding site models for macromolecules is expected to follow a similar route. Here, some 
critical issues with recent theories of reduced binding site models are examined. 



1. Introduction 



Small metal ions are a fundamental component to the structure and function of biological systems. 
Although detailed computation have a central role to play in trying to understand the molecular 
determinants of ion selectivity in these complex systems, progress may be achieved by pursuing the- 
oretical studies based on simplified reduced models comprised of only the nearest ion-coordinating 
ligands [Il[2l[3llll[5llS|7l[8l[9l[l0l[IIl[l2l[l3]. By construction, such reduced binding site models 
only treat the ion and the nearest coordinating ligands explicitly, while the influence of the missing 
atoms from the surroundings (protein or solvent) is incorporated indirectly. Although the general 
idea is conceptually simple, a rigorous statistical mechanical treatment is necessary to draw mean- 
ingful conclusions about complex macromolecular systems. A typical starting point is to separate 
the system into "inner" and "outer" regions. Quasi-chemical theory (QCT) offers one example of 
how to carry out this type of formal separation in the case of an ion in bulk solvent [14| [T5| [T6] . 
The system is rigorously partitioned into a small fixed spherical volume comprising the ion and n 
solvent molecules, and a second region corresponding to the remaining bulk liquid phase [14 ^1151 [T6] . 
Thermodynamic ion solvation properties are then expressed as a sum over a series of clusters com- 
prising the ion and n water molecules, each treated at the QM level, and each explicitly weighted 
by the probability P{n). QCT was shown to be formally equivalent to a treatment of ion solvation 
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based on the small system grand canonical ensemble (SSGCE) [T6]. A particularly simple form 
of QCT, referred to as the primitive QCT (pQCT), evaluates configurational integrals using the 
rigid rotor harmonic oscillator (RRHO) approximation. The studies of Varma and Rempe have 
relied on pQCT fHIS], while those of Bostick and Brooks were formulated from SSGCE [8]. Both 
approaches are constructed by first considering the probability of finding n ligands in a spherical 
region, a standard procedure in the development of theories of solvation in bulk liquids pT l fTS l fTB]. 
However, both approaches encounter similar difficulties because the statistical properties of the 
ligands of a protein binding site were assimilated with those of a bulk liquid. In the following, we 
critically examine some of the issues arising in the context of QCT and SSGCE. 

2. Analysis 

a) Primitive Quasi-Chemical Theory (pQCT) 

The primitive form of QCT (pQCT) was used to examine the factors governing selectivity in ion 
channels jU [5]. To clarify the significance of pQCT when applied to ion channels, it is useful to 
recall the underlying theoretical framework. QCT is a statistical mechanical theory in which the 
influence of the solvent molecules surrounding an ion in a bulk liquid is separated into near and 
distant contributions |15i [16] . The construction of pQCT of ion solvation is illustrated schematically 
in Fig. lA. 




Figure 1: Schematic illustrations of primitive quasi-chemical theory (pQCT) in different situations. (A) Application 
of pQCT to the hydration of a small ion. One the left is shown an ion in bulk water. The dashed line indicates 
the position of the inner-shell region. On the right, the pQCT cluster sum appearing in Eq. ^ is depicted for 
n = 0, 1, 2, 3, . . .. The water molecules surrounding the ion in the inner-shell are represented explicitly, whereas those 
in the outer shell are represented as a continuum of density p. (B) Application of pQCT to an ion binding site in 
a protein On the left is shown the ion with its coordinating ligands in the protein binding site. The ligands are 
attached to the protein, which is a covalently connected structure. On the right, a putative pQCT construct of the 
same system is depicted, with the ion-coordinating ligands explicitly represented and the outer region incorporated 
implicitly as a generic liquid continuum (of density "p?"). 

Central to QCT is the definition of a spherical inner-shell region of radius R centered on the ion. 
A possible route for deriving QCT is to sort out the multitude of possible configurational integrals 
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according to the number of solvent molecules occupying the inner-shell region, and then treat the 
latter as a small open system in equilibrium with a bulk region (density p) composed of identical 
solvent molecules that are free to exchange position with the bulk [16]. The statistical mechanical 
developments bear many similarities to the concept of SSGCE considered by Reiss and co-workers 
[T7] . QCT adopts a particularly simple form once it is assumed that the clusters undergo small 
harmonic fluctuations around a single energy minimum. This leads to the so-called "primitive" 
QCT approximation (pQCT), which provides the following expression for the solvation free energy 
of an ion i in a liquid [16j , 



(1) 



where the sum pertains to small clusters comprising one ion surrounded by n solvent molecules. 
Here, Pf{Ri) is the probability of finding an empty spherical cavity of radius Ri in the bulk liquid, 
p is the density of the bulk liquid, A/i is the excess chemical potential of a solvent molecules in 
the bulk liquid, £'™™(n) is the binding energy minimum of the cluster (optimal energy-minimized 
geometry), ATVj is the solvation free energy of the cluster (coupling with the surrounding bulk 
phase), and K\'"{n) is a product of standard translation-rotation-vibration partition functions 
for the cluster in the gas phase. Quantities such as A/i and APFj are often approximated via a 
continuum dielectric model [181 [El EOl ES] , although this is not necessary. 

pQCT was utilized in several studies of hydration of Li^, Na'^ and ^I8l [T9l [20j . for which 
it was specifically designed. Those studies, which combined pQCT with quantum mechanical 
methods, are of great interest. Nevertheless, they did not provide a quantitative assessment of the 
accuracy of pQCT. To achieve this goal, we compared the outcome of pQCT with "exact" results 
from MD simulations using simple classical models of water and ions [16]. The comparison shows 
that the results of pQCT about the hydration of those small metal cations must be interpreted 
with caution because the accuracy of the approximation varies for ions of different size [16]; Li"*" 
is represented with excellent quantitative accuracy, while Na"*" is described less accurately and K"*" 
encounters some serious difficulties. These tests showed that even when the pQCT approximation 
yields reasonable solvation free energies, it could be quantitatively unreliable regarding average 
coordination numbers |16j . 

Although pQCT and Eq. ^ were derived to describe ion solvation in a bulk liquid, the approach 
was also used to study ion selectivity in proteins binding sites ^ |5j . As illustrated schematically in 
Fig. IB, the application of pQCT to proteins raises a number of issues because this is a situation 
for which many of the assumptions used to derive Eq. ([T]) are not satisfied. For example, the 
mathematical developments leading to Eq. ([T]) specifically rely on the fact that solvent molecules 
are identical and free to exchange with one another in the bulk phase (see Eqs. 6-13 in ref [16]). 
Eq. (g is an expression that is relevant to an ion in bulk solvent, not in a protein binding site. 
The only information about the protein structure that can be incorporated into pQCT is to restrict 
the sum over n in Eq. ([T]) to a single coordination number. An application of pQCT and Eq. ([T]) 
to discuss ion binding to a protein such as the selectivity filter of the KcsA K"*" channel is thus 
of unclear significance because the coordinating ligands are tethered to the polypeptide backbone. 
In practice, to apply Eq. ([l]) to KcsA, one must "map" the properties of the selectivity filter onto 
those of a bulk liquid, i.e., p and A/i, the density and excess chemical potential. In their study 
of the KcsA channel, acetamide molecules were used to represent the backbone carbonyl groups of 
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the selectivity filter j3], a chemically reasonable choice. But what value of p and A/u, should be 
used in Eq. ([T]) to best represent the selectivity filter of the KcsA channel is unclear. Should one 
estimate those parameters from the properties of liquid acetamide? Or should one try to derive 
some effective value of those parameters mimicking the interior of an average membrane protein? 
How is the influence of the relative flexibility of the surrounding protein structure incorporated 
into the theory? Furthermore, the results depend on the radius of the inner shell, chosen for a 
particular ion. These critical issues were not addressed in previous studies of ion selectivity based 
on pQCT [4]. 



b) Attempts to extend QCT to inhomogeneous systems 

An extension of QCT for inhomogeneous systems applicable to protein binding sites was recently 
proposed [12]. The framework was only elaborated schematically in the form of the thermodynamic 
cycle shown in Fig. [2] To facilitate a discussion of ion binding thermodynamics, it is therefore 
necessary to translate the states appearing in this scheme into standard mathematical statistical 
mechanical expressions (all configurational integrals will be written in the NVT ensemble for the 
sake of simplicity). 

The main difference from previous versions of QCT developed for homogeneous fluids is the 
change in the ordering of ligand removal and prearrangement of the coordination structure con- 
straint function. The non-covalent transfer of an ion into the protein binding site corresponds the 
equilibrium steps A— s-F. At its base, the scheme is design to contain a gas-phase ion-ligands cluster 
formation free energy in step step C— t-D that is typical of QCT. The schematic thermodynamic 
cycles is designed to permit use of gas-phase QM calculations for cluster formation energies. In 
the cycle, the n-coordination structure specified by Cn is formed, then its interaction with the 
surrounding medium are removed to bring it into the gas phase before turning on solute-to-ligand 
interactions. The ambiguity of "the complex is formed" must be resolved in practice with an in- 
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Figure 2: Schematic illustration of an extension of quasi-chemical theory (QCT) for inhomogeneous systems 
applicable to protein binding sites (directly taken from the Figure 2 of reference [12]). The intermediate stages shown 
schematically are specified by enclosing noninteracting molecules with dotted lines, and indicating coordination and 
excluded volume conditions by shading. While this was not specified, it is presumed that the green background 
represents the inhomogeneous environment (including that of the protein) surrounding the ligands and the ion. 

dicator function, /(R;Cn), which is 1 whenever a particular point in phase space satisfies specific 
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local structural requirements and zero otherwise. The indicator function is designed to include 
constraints of coordination and excluded volume conditions (i.e., preventing moieties from the 
surroundings to directly interfere with the specified n-coordination structure). In Fig. [2| this is 
indicated by "white shading", but is left otherwise unspecified. There is a wide range of possi- 
bilities to construct practical indicator function. For example, coordination might be defined by 
counting the ligands within a maximum distance of 3.5 A away from the ion. Alternately, one may 
choose a definition that is based on a maximum root mean squared deviation tolerance (RMSD) 
with respect to an ideal coordination complex geometry. Different constraints can be defined for 
each coordination complex of interest for a given system. This generalization is appropriate for 
discussing chemical reactions occurring in arbitrary, inhomogeneous environments due to the nat- 
ural appearance of the environmental potential of mean force (PMF) when calculating partition 
functions with coordination constraints. 

It is helpful to first consider the equilibrium 
between the end-states A and F depicted in Fig. [2] 
summarized in Fig. [3j Here, a "blue line" was 
added to remind ourselves that a macromolecule 
is part of the inhomogeneous "green background" 
environment, and that the ligands are covalently 
tethered to it. The state A is meant to represent 
the configurational integral Za in which the ion 
is noninteracting ion. 





J apo 



Figure 3: Equilibrium binding of one ion into a binding 
site with the ligands La . The dotted lines around the ion 
X in state A indicates that the particle is noninteracting. 
The blue line represents the protein structure to which 
the ligands are covalently bonded. Adapted from 
Figure 2 of reference I12j. 



where R represents all the coordinates in the system, and is the potential energy when the ion 
is noninteracting. The state F is meant to represent the configurational integral Zp of the ion in 
the protein binding site, 



Zp 



holo 



(3) 



The actual equilibrium binding constant of the ion, K\^^ can be expressed in terms of the ratio 
ZfIZa as. 



Za 



(4) 



where Gbuik is the free energy to transfer the ion X from the bulk solution to the gas phase (absolute 
hydration free energy of the ion) , and V is the volume of the system spanned by the noninteracting 
ion {V = /drion). 

Four additional intermediate states, B-E, are inserted between the end-point states A and F 
in Fig. [2] They correspond to arbitrary constructs used to define a step-by-step procedure to go 
between the two end-states A and F, and thus compute the total binding free energy of the ion. 
To facilitate the analysis, we distinguish three main atomic and molecular components in the total 
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potential energy of the system U{Ii): the ion (i), the ion-coordinating Hgands (1), and the rest of 
the system (r). The latter includes the rest of the protein as well as the surrounding bulk solvent. 
The system is separated into an inner and an outer region. The ion and the n coordinating ligands 
are part of the inner region, while the rest of the system is part of the outer region. Let the 
coordinates of the ion and the n coordinating ligand be represented by X = {rion, xi, . . . , x„}, and 
those for the remaining atoms be represented by Y, and R = X, Y. The total potential energy U 
follows naturally from this decomposition and can be expressed as the sum: 

n 

C/(R) = nii(X)+nir(X,Y) + ^nj(xi) + nii(X) + ui,(X,Y) + n,,(Y) (5) 

i=l 

where the separate energy terms correspond to: ion-ligand (il), ion-rest (ir), isolated ligand (1), 
ligand-ligand (11), ligand-rest (Ir), and rest-rest (rr). Such a form arises naturally for a pairwise de- 
composable molecular mechanical force field, although it is always possible to formally re-construct 
the total potential energy as such a sum of separate terms, even in the context of a non-pairwise 
additive quantum mechanical energy surface. We can now try to express mathematically all the 
intermediate steps, B, C, D, E of Fig. [2] introduced in ref |12j . 

State B has both the ion X and the ligands being constrained 
within coordination and excluded volume conditions (surrounded by 
white shaded area). The ion X (surrounded by dotted line) is noninteract- 
ing and decoupled from its surroundings, while the ligands are interacting 
normally. The configurational integral Zb is, 



ZB = JdX dYI{K;Cn) e-4E:=i"i(-0+«n(x)+«,.(x,Y)+«..(Y)] 



State C has both the ion X and the ligands constrained within coor- 
dination and excluded volume conditions (surrounded by white shaded 
area). The ion and the ligands (surrounded by dotted lines) are fully de- 
coupled from the surroundings (they are not interacting with anything). 
The configurational integral Zq is, 

Zc = JdX dYI{-R; Cn) g-'^ELi 

State D has both the ion X and the ligands constrained within coor- 
dination and excluded volume conditions (surrounded by white shaded 
area). They are all surrounded by a single dotted line, which implies 
that the ion and the ligands interact together, but are decoupled from 
the surrounding. The configurational integral Z^ is. 



Zn = JdX dY I(R; Cn) e-4""(X)+E:=i nj(x,)+nn(X)+«„(Y)] 
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State E has both the ion X and the hgands constrained within coor- 
dination and excluded volume conditions (surrounded by white shaded 
area). The ion and the ligands are not surrounded by a dotted line, which 
implies that they are fully interacting and coupled to the surroundings. 
The configurational integral Ze is, 

Ze = J dX dY /(R; (7„) e-^hi(^)+"-(^'Y)+Er=i «j(x,)+nn(x)+«i,(x,Y)+«„(Y)] 

The meaning of the intermediate steps can be summarized in the following. At the heart of 
the cycle, step C— s-D is the gas-phase ion-ligands cluster formation that is typical of QCT. Since 
this step is an isolated cluster, the plan is to evaluate the configurational integrals using the rigid 
rotor, harmonic oscillator (RRHO) approximation, which is often employed in QM thermodynamic 
calculations, 

Zd /dX/(X;Cn) e-^KiW+Er=i«j(''')+""(x)] 



J dX I(X; Cn) e-'^E-i 
= e-P'^f-'i'-) Kl"{n) (10) 

where Ef^™{n) is the binding energy for the formation of the isolated cluster (optimal energy- 
minimized geometry in vacuum). While the geometry of the cluster is optimized, it remains re- 
stricted to a specific n coordination state Cn via the indicator function /(X;Cn). The quantity 
Kl" (n) corresponds to modified gas phase equilibrium translation-rotation- vibration partition func- 
tions for the cluster assembly of one ion with n ligands while they are spatially restricted via the 
function /(X, C„); note that /(X;C„) was written as a function of X without its dependency on 
the coordinates of the rest of the system, Y, since the ion-ligands cluster is assumed to be isolated 
in the gas phase. 

The two steps A— )-B at the beginning of the cycle, and E— )-F at the end of the cycle, consist 
in imposing the coordination structure C„ to the ion-ligands subsystem. Step A— )-B is a process 
imposing the coordination structure constraint on the fully interacting ligands in the presence of a 
noninteracting ion, 

Zb _ /(iX(iY/(R;C„) e-^Cr=i"i(''')+«ii(x)+«ir(x,Y)+«„(Y)] 
Ya ~ / dX dY e"'^Er=i «j(''»)+«ii(X)+«i.(x,Y)+«„{Y)] 

= (/(X;C„)) 

= P"P°(C„) (11) 

where P^^°{Cn) is the probability of finding the ligands coordination structure C„ when the ion is 
noninteracting. Step E— t-F is a similar process, imposing the coordination structure constraint on 
the fully interacting ion and ligands, 

Zp J dX dY e-/3hi(X)+""(X:Y)+X;Li «i{x,)+«n(X)+«i,(X,Y)+«„(Y)] 

~ / dX dY/(R; Cn) e-'^KW+^-^^'^^+ELi «j(''>)+«ii(^)+"i^(X'Y)+«"(Y)] 
1 



(/(R;C„)) 
1 



(12) 



7 



where P^°'°(C„) is the probabihty of finding the fully interacting ion and ligands in the coordination 
structure Cn- 

The two remaining steps, B— t-C and D— t-E, correspond to the process of coupling the subsystem 
(constrained according to the coordination structure Cn) to the surrounding. Step B— )-C is the 
process of de-coupling the ligands constrained according to the coordination structure Cn from its 
surrounding (the ion is noninteracting) . Step D— )-E is the process re-coupling the fully interacting 
ion-ligands complex to its surrounding. Let us assume-in analogy with the quantity AWi in Eq. ([T])- 
that coupling the subsystem (apo or holo) with the surroundings in steps B— t-C and D— t-E will yield 
the free energy contributions AW^p° and AW^°^°, 

Ze ! dX clY /(R; C„) e-'^Ki(^)+"-(^''^)+Er=i '^;(x»)+«ii(X)+«i,{x,y)+«,,(Y)] 

= / dX dY /(R; Cn) e-^hi W+SLi -i{x.)+nu(X)+«„(Y)] 

(13) 



and 



Zc _ JdX dY /(R; Cn) e'^E-i W(x.)+%i(x)+«rr(Y)] 

(Y)] 

(14) 



Zb J dX dY /(R; Cn) e-''Cr=i "i{xO+«ii(X)+«,.(X,Y)+«,,(Y 



(However, see the problems below with the definition of AW^^° and AW^°^°.) 

Using all the contributions from the thermodynamic cycle, we can re-express the equilibrium 
binding constant i^b as. 



Za 
Zf 

ZeJ \ZdJ \Zc ) \Zb ) \Za 



The first curly bracket is only concerned with the gas-phase QCT cluster binding free energy Kl'^'^[n) 
and the hydration free energy of the ion Gbuik- In the treatment of a binding site via the QCT for 
inhomogeneous systems, this part is clearly meant to be a central quantity that must be evaluated 
by high-level quantum mechanical methods with energy minimization and a RRHO approximation. 
However, while the contribution from the ion-ligands cluster is important, it is critical to include 
the other terms in order to account for the properties of a real protein binding site. 

All external effects (e.g., protein conformational preferences, bulk system composition, applied 
membrane electric field, surface tension, pressure, etc) are supposed to be taken into account by the 
terms contained within the second curly brackets. Here, these effects are expressed in the form of a 
probability for formation of each state must be included in the probabilities P^^°{Cn) and P'^°^°(Cn), 
and in the free energy contributions ACf" and AGJ'"^". In principle, the probabilities P''P°(C„) 
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and P'^°^°((7„) can be calculated by using simulations of a realistic all-atom model of the entire 
system, though it is worth stressing that these probabilities cannot be evaluated by considering 
only an isolated ion-ligands cluster in the gas phase. In contrast, the free energy contributions 
AW^^° and APV^'^"^", which correspond to the coupling of the subsystem to its surroundings, are 
extremely problematical. These quantities are "defined" by Eqs. (13) and (14), following the 
schematic notation with "dotted line and white shading" adopted in the Fig. 2 of ref jl21. In 
the case of ion solvation in a bulk liquid, which is the normal route in the development of QCT 
[16] . such a decoupling step is not problematic because the surroundings is constituted by solvent 
molecules that are free to adapt to the ion-ligands cluster. Hence, the quantity AWj in Eq. ([T]) is 
perfectly well defined for a cluster immersed into a bulk liquid. However, in the case of ligands that 
are covalently attached to a protein structure, this thermodynamic route is impractical. In reality, 
a protein structure is expected to be extensively disrupted when the ligand-rest coupling uir(X, Y) 
is switched off (indicated schematically by a wavy blue line in the schematic representations of 
state C and D). Therefore, even though the ion and the ligands may remain spatially restricted by 
the function I(R;Cn), the rest of the protein will distort and probably unfold as the subsystem 
of ligands is decoupled from the rest of the structure. Expressions such as Eqs. (13) and (14) can 
be formally written, but they are essentially meaningless an unusable in practice. In conclusion, 
the free energy contributions defined via the steps B— )>C and D— )>E, cannot be used to carry out a 
practical calculation based on a realistic atomic model for the purpose of evaluating the effect of 
the protein structure on an ion-ligands subsystem. 



c) Small System Grand Canonical Ensemble (SSGCE) 

Bostick and Brooks presented a statistical mechanical framework for deconstructing the determi- 
nants of selective ionic complexation in protein binding sites f8]. The theoretical developments 
lean on the concept of SSGCE considered by Reiss and co-workers |17j . and start by considering 
the statistical distribution function of the ligands around an ion interacting with a polypeptide 
comprising possible coordinating ligands. The resulting framework is closely related to QCT for 
an ion solvated in a bulk liquid |16] . However, a careful examination reveals fundamental problems 
in the case of an ion bound to a protein binding site. 

The notation of ref [8j is kept for the sake of clarity. The N ligands, located at position 
{ri, . . . , rjv}, are assumed to be chemically identical (e.g., they are all backbone carbonyl oxygens). 
The potential energy of the system is ?7(ri, . . . , r^v, R), where R denotes additional degrees of 
freedom from the solvent. Without loss of generality it is assumed that the ion is fixed at the origin 
(in the following, the fixed ion is implicitly included in the configurational integrals but is omitted 
in the notation for clarity). An inner region is defined as a spherical subvolume v within a distance 
Tc away from the ion, and a counting function, c(ri, . . . , r^v), is introduced to monitor the number 
of ligands within the inner region, c(ri, . . . , r^r) = X^i^i -f^(kj| ~ '''c) (for any configuration, the c is 
equal to the number of ligands in the inner subvolume). The spherical subvolume is analogous to 
the inner shell that is defined in pQCT (see Fig. lA, left). Defining the discrete Kroenecker delta 
function 6c,m the probability to have exactly n ligands in the inner region, P^^{n) = (<^c(ri,...,rjv),n)) 
is written as, 

pCR(^n) = ^Jdr,...Jdrr,JdK 5,(r,....,r^),„ e-f'^'-^^'-'^-'''^ (16) 
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where Z is the unrestricted configurational integral, 



Z = J dri... J dTN J dR e-/5^(ri.-.r^'R) 



(17) 



Eq. ( 16 ) can be expanded to explicitly display the contribution from all possible permutations of 



the ligands such that exactly n of those are located within the inner region, 



+ — / dri... / (ir„„i 

^ J 'm Jin J out 



+ -J ar,.. 
+ ... 



I [ dr,... [ dVn [ dVn+l ...I dVN I dR e-/3^(ri>--riV,R) 
Jin Jin J out J out J 

I dVn+1 I drn+2 . . . / dvN e-/^^(r„...,r^,R) 
Jin J out J out 

dri ... / drn-i I dvn [ dvn+i f drn+2 f dr^+s ... f dvN fdR e-'5^('-i'-''-'^'^) 

^in J out J out Jin J out J out J 

(18) 



There are A^!/n!(A^ 



n 



permutations of the ligands such that exactly n are located inside the 



inner region. If the ligands were solvent molecules in a liquid, then all those permutations would 
yield equivalent expressions of equal weight. One could choose one of the possible permutation, 



for example the first term in Eq. ( 18 ) with ligands 1 to n within the inner region and ligands 



n + 1 to outside, and then write the probability to have exactly n ligands in the inner region as 
[I71EII2I1122], 



jCR, 



m 



n] 



n\{N -n)\Z 



f dr,... f dvn I dvn+i ... I dvN I dR e-^^(^i'-'^^'^) (19) 

J in J in J out J out J 



While the relation between SSGCE and QCT were not noted by the authors, the mathematical 
steps corresponding to Eqs. ( 16p!9 ) are identical to those used to derive QCT for an ion solvated 
in a bulk liquid (see Eqs. 6-13 in ref |16j). 



The critical issue that is of concern is the use of Eq. ( |19| ) to represent the probability of finding 
n protein ligands within the inner shell region of an ion (this is Eq. (A7) in ref [8]). As illus- 
trated schematically in Fig. 2, the use of the combinatorial factor N\/n\{N — n)! in Eq. (19) to 



represent the total sum over all possible permutations expressed in Eq. (18) is strictly valid only 



if the Boltzmann factor is invariant upon the exchange (swapping) of ligand coordinates. In a 
liquid, permutation of any pair of chemically identical molecules i and j yields the same potential 
energy, f7(...,ri,.. 



.) = U{. 



'31 



and it is possible to write Eq. ( 19 ) with the 



combinatorial factor \n\ [T^ [2T1 I22j . But such a simplification is invalid when the ligands are 
covalently linked to a polypeptide. Even if ligands i and j are "chemically identical", swapping 
their coordinates should not generate an equivalent configuration with the same potential energy 
and Boltzmann weight because they are covalently attached to the polypeptide chain. The lack 
of invariance upon exchange of identical ligands is automatically satisfied, by construction, if the 
energy of the polypeptide is represented on the basis of a classical molecular mechanical force field 
with a fixed list of chemical bonds |23j . But more generally, even if the energy of the polypeptide 
were represented on the basis of a quantum mechanical Born-Oppenheimer energy surface that 
is formally invariant upon exchange of identical nuclei, restricting the configurational integral of 
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Figure 4: Schematic illustration of configurations of an ion coordinated by n ligands bonded to a polypeptide chain 
of A'^ ligands with a fixed topological structure in the Small System Grand Canonical Ensemble (SSGCE) formulation. 
The ion is depicted as a yellow circle placed at the origin, the ligands are depicted as green-red dumbbells, the covalent 
topological structure of the polypeptide backbone with its predetermined connectivity is depicted schematically as 
a thick green line, and the solvent molecules are not shown. The inner subvolume region is shown as a thin dotted 
line. For the sake of the figure, N = 8 and n = 3. In A is shown a configuration with ligands 1-3 located within 
the inner subvolume, and ligands 4-8 located outside the inner subvolume region. The energy of the configuration is 
U{ri, r2, ra, r4, rs, re, rr, rg). In B is shown a configuration obtained from A by swapping the coordinates of ligands 3 
and 4; ligands 1, 2 and 4 are within the inner subvolume region, and ligands 3, 5, 6, 7, and 8 are located outside the 
inner subvolume region. The energy of the configuration is f/(ri, r2, r4, ra, rs, rg, ry, rg), different from the energy of 
the configuration in A. Configurations A and B would strictly be equivalent if the polypeptide backbone was removed 
and the ligands were separate molecules. Configurations A-D, all with n = 3 ligands within the subvolume, provide 
additional examples of contributions to the sum over permutations expressed in Eq. ( |18[ ); A-C represents typical 
contribution to the first, second and third configurational integral in Eq. (181. All four configurational integral 
pictured schematically in A-D is expected to have a different statistical weight. Because configurational integral 
associated with A-D are expected to have different statistical weights, it is not possible in general to represent the 



total sum over all possible permutations of Eq. ( 18 1 in terms of a combinatorial factor as in Eq. ( 19 1 



Eq. (19) to a single state of covalent topological connectivity would still be necessary to formulate 
a meaningful statistical mechanical theory of ion binding to a protein site. Doing otherwise, i.e., 
constructing P^^{n) from a superposition of all configurational states including those obtained 
by swapping the coordinates of identical nuclei, implies that one is averaging over processes that 
are actually slower than any macroscopic observation timescale. Bond breaking and forming can 
occur in a polymer, but under normal conditions at room temperature, atoms and residues along a 
polypeptide chain do not spontaneously swap their position (except labile hydrogens). For example, 
it is possible to isotopically label the atoms of any carbonyl group in the gramicidin channel to 
perform NMR experiments, and such labeling can remain stable for months |24j . The permutation 



of identical ligands implicit in Eq. ( 19 ) is incorrect because a valid statistical mechanical treatment 
should only sum over the microstates that are accessible within a reasonable observation time (see 
pp. 42-43 in ref [25]). 

The statistical mechanical framework presented by Bostick and Brooks was designed to show 
how the coordination structure of an ion solvated by a liquid provides the key elements needed to 
understand the selectivity displayed by a protein binding site. To firm up this view, the authors 
analyzed MD simulations of simplified model systems comprising an ion immersed in a generic fluid 
of unrestrained ligands in the context of their statistical mechanical framework. The fluid of unre- 
strained ligands, referred to as the "soup of ligands" or "hypothetical fluid of coordinators" (HFC), 
serves as a key reference in the analysis. But the significance of the analysis based on the HFC is 
unclear. The HFC, rooted in Eq. (A7), assimilates the combinatorial factor associated with the 
configurations of an ion solvated in a liquid to that of an ion coordinated by the covalently linked 
ligands in a protein binding site. 
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By incorrectly treating protein ligands as identical solvent molecules that are free to exchange 
with one another in the bulk phase, the theoretical development presented in ref |8| creates the 
impression that information directly relevant to an ion binding site in a protein can be extracted 
from the properties of the HFC. In reality, the link between an ion-selective protein binding and 
the HFC is unclear. 

3. Summary 

While the concept of a reduced model is intuitively obvious, many choices are possible, leading to 
important differences in formulations and implementations. The general idea of a reduced model is 
that the "near" degrees of freedom must be treated explicitly while the influence of the surrounding 
must be implicitly incorporated. Frameworks based on quasi-chemical theory (QCT) |H[5l[T2] and 
on the small system grand canonical ensemble (SSGCE) development ^ have previously been 
proposed to achieve such a reduction of ion binding sites in macromolecules into reduced models. 
However, a number of problems were discovered with these frameworks. Ultimately, the major 
shortcomings of approaches based on QCT or SSGCE as formulated until now lie in their inability 
to account for the influence of the covalently linked protein structure. One consequence is that these 
formulations fail to draw a clear distinction between ion-selective protein sites that are either very 
rigid or very flexible [l.Oj. Developing a valid statistical mechanical theory of reduced models for 
ion binding sites in macromolecules remains an important and valuable goal for future efforts. 
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